{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Configuration files: usage and composition"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading configuration files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When ```DataPrep.Connector.connect()``` is called, the system reads from a configuration folder, which contains information about how to build connection with specific APIs.\n",
    "\n",
    "You can load the configuration files in the following two ways.\n",
    "\n",
    "### 1. Loading existing files from our github repo\n",
    "\n",
    "We maintain a github repo that contains configuration files for more than 20 websites [here](https://github.com/sfu-db/APIConnectors/tree/develop/api-connectors).\n",
    "\n",
    "As an example, with the following code, the system will download the configuration file folder of dblp from our repo, load it, and build the connection.\n",
    "The file is placed at the system temporary file folder.\n",
    "\n",
    "```\n",
    "from dataprep.connector import connect\n",
    "conn = connect(\"dblp\")\n",
    "```\n",
    "\n",
    "```connect()``` provides a parameter called ```update```, which forces downloading of the fresh config files if set to ```True```.\n",
    "\n",
    "\n",
    "### 2. Loading from a local directory\n",
    "\n",
    "You can also load from a local directory that contains your own configuration files. \n",
    "For example, by writing the following code, ```connect()``` will load from the corresponding folder and build the connection.\n",
    "You should have a folder called ```dblp``` in parallel with your code file and that contains the configuration files.\n",
    "\n",
    "```\n",
    "from dataprep.connector import connect\n",
    "conn = connect(\"./dblp\")\n",
    "```\n",
    "\n",
    "When the website API that you want to access is not supported by us, you will want to write your own configuration files. \n",
    "Or when you want to do some modification for the configuration files, you need to first download the configuration files to your local computer, change the files accordingly, and then load it from the local directory.\n",
    "\n",
    "See below for how to create your own configuration folder and files."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Composing the configuration files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What is in a configuration folder?\n",
    "\n",
    "```connect()``` access the configurations for an API through a folder.\n",
    "\n",
    "A configuration folder contains two parts: ```_meta.json``` file and the configuration files.\n",
    "The following is the content of the ```_meta.json``` file for the ```dblp``` folder.\n",
    "It describes what are the ```tables``` available for this website.\n",
    "In DataPrep.Connector, we model the data behind each endpoint as a table just like the tables in the DBMS.\n",
    "\n",
    "```\n",
    "{\n",
    "    \"tables\": [\n",
    "        \"publication\"\n",
    "    ]\n",
    "}\n",
    "```\n",
    "\n",
    "The ```_meta.json``` indicates what are the configuration files. \n",
    "In the ```dblp``` folder, it is expected that there is a configuration file called ```publication.json```.\n",
    "The details of configuration files are described in next subsection.\n",
    "\n",
    "In our repo, the folder also contains the test files for the configuration files. \n",
    "It tests if the configuration files can be processed smoothly by ```connect()```."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Configuration file\n",
    "\n",
    "A config file is the magic that makes the data available via the simple function calls.\n",
    "The configuration files are reusable.\n",
    "\n",
    "\n",
    "Configuration files describe the settings of an API, such as:\n",
    "\n",
    "* What is the endpoint of the API?\n",
    "* What authorization scheme an API uses? (see authorization scheme section)\n",
    "* What pagination scheme of the API? (see auto-pagination section)\n",
    "* What are the parameters the query support?\n",
    "* What is the schema of the returned results?\n",
    "\n",
    "A tutorial of how to write a configuration file is [here](https://github.com/sfu-db/APIConnectors/blob/develop/CONTRIBUTING.md)\n",
    "\n",
    "Below shows the configuration file of the publication API."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```\n",
    "{\n",
    "    \"version\": 1,\n",
    "    \"request\": {\n",
    "        \"url\": \"https://dblp.org/search/publ/api?format=json\",\n",
    "        \"method\": \"GET\",\n",
    "        \"params\": {\n",
    "            \"q\": true,\n",
    "            \"h\": false,\n",
    "            \"f\": false,\n",
    "            \"author\": {\n",
    "                \"template\": \"author:{{author | replace(\\\" \\\", \\\"_\\\")}}:\",\n",
    "                \"required\": false,\n",
    "                \"removeIfEmpty\": true,\n",
    "                \"fromKey\": \"author\",\n",
    "                \"toKey\": \"q\"\n",
    "            },\n",
    "            \"name_parts\": {\n",
    "                \"template\": \"author:{{first_name}}_{{last_name}}:\",\n",
    "                \"required\": false,\n",
    "                \"removeIfEmpty\": true,\n",
    "                \"fromKey\": [\n",
    "                    \"first_name\",\n",
    "                    \"last_name\"\n",
    "                ],\n",
    "                \"toKey\": \"q\"\n",
    "            }\n",
    "        },\n",
    "        \"pagination\": {\n",
    "            \"type\": \"offset\",\n",
    "            \"offsetKey\": \"f\",\n",
    "            \"limitKey\": \"h\",\n",
    "            \"maxCount\": 1000\n",
    "        },\n",
    "        \"search\": {\n",
    "            \"key\": \"q\"\n",
    "        }\n",
    "    },\n",
    "    \"examples\": {\n",
    "    \t\"q\": \"'lee'\"\n",
    "    },\n",
    "    \"response\": {\n",
    "        \"ctype\": \"application/json\",\n",
    "        \"tablePath\": \"$.result.hits.hit[*].info\",\n",
    "        \"schema\": {\n",
    "            \"title\": {\n",
    "                \"target\": \"$.title\",\n",
    "                \"type\": \"string\"\n",
    "            },\n",
    "            \"venue\": {\n",
    "                \"target\": \"$.venue\",\n",
    "                \"type\": \"object\"\n",
    "            },\n",
    "            \"publisher\": {\n",
    "                \"target\": \"$.publisher\",\n",
    "                \"type\": \"string\"\n",
    "            },\n",
    "            \"year\": {\n",
    "                \"target\": \"$.year\",\n",
    "                \"type\": \"string\"\n",
    "            },\n",
    "            \"type\": {\n",
    "                \"target\": \"$.type\",\n",
    "                \"type\": \"string\"\n",
    "            },\n",
    "            \"key\": {\n",
    "                \"target\": \"$.key\",\n",
    "                \"type\": \"string\"\n",
    "            },\n",
    "            \"ee\": {\n",
    "                \"target\": \"$.ee\",\n",
    "                \"type\": \"string\"\n",
    "            },\n",
    "            \"url\": {\n",
    "                \"target\": \"$.url\",\n",
    "                \"type\": \"string\"\n",
    "            },\n",
    "            \"authors\": {\n",
    "                \"target\": \"$.authors.author[*].text\",\n",
    "                \"type\": \"object\"\n",
    "            },\n",
    "            \"pages\": {\n",
    "                \"target\": \"$.pages\",\n",
    "                \"type\": \"string\"\n",
    "            },\n",
    "            \"doi\": {\n",
    "                \"target\": \"$.doi\",\n",
    "                \"type\": \"string\"\n",
    "            }\n",
    "        },\n",
    "        \"orient\": \"records\"\n",
    "    }\n",
    "}\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
